Entry Name:  "KUL-Chua-MC2"

VAST Challenge 2014
Mini-Challenge 2

 

 

Team Members:

Alvin Chua, KU Leuven, alvin.chua@asro.kuleuven.be PRIMARY

Ryo Sakai, KU Leuven, ryo.sakai@esat.kuleuven.be

Jan Aerts, KU Leuven, jan.aerts@esat.kuleuven.be

Andrew Vande Moere, KU Leuven, andrew.vandemoere@asro.kuleuven.be

 

Student Team:  Yes

 

Analytic Tools Used:

Our procedure consists of three stages: (1) Aggregate & Slice, (2) Design, Filter & Analyze and (3) Communicate. Stage 1 is concerned with rapidly discovering insights and make use of openly available software to test hypothesis. Stage 2 involves the design and implementation of streamlined tools to optimize the identification of specific patterns. Finally, we present our discoveries in stage 3 with simplified abstractions so that they can be easily understood.

Stage 1: R, QGIS

Stage 2: A series of visualization tools developed in Processing by Ryo Sakai and Alvin Chua at DataVisLab, KU Leuven consisting of two interactive and a static visualization. Our first interactive visualization links parallel coordinates to a map while our second links a timeline to histograms and an origin-destination (OD) map. Finally, our static visualization represents information displayed on a timeline as a flow diagram to optimize visual search.

Stage 3: Graphvis

 

Approximately how many hours were spent working on this submission in total?

96 hours

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2014 is complete?  Yes

 

Video:

https://www.dropbox.com/s/ejiwwncza6xgzy2/KUL-Chua-MC2.mp4

 

KUL-Chua-MC2

 

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

MC2.1Describe common daily routines for GAStech employees. What does a day in the life of a typical GAStech employee look like?  Please limit your response to no more than five images and 300 words.

 

According to the three stage process described above, R was used to quickly visualize the GPS data. Fig.1a shows a relationship between employment type and spatial activity. Fig.1b indicates that the spatial activity on weekdays posses a routine pattern in comparison to the more volatile movements on weekends. A key discovery suggests that engineers, executives and IT staff tend to move across the map in a distinctively different pattern in comparison to security and facilities. We further processed the data so that movements can be represented as an OD map where the average daily pattern can be expressed as a series of movements between locations. Locations for the OD map are detected based on the time intervals in the GPS data where a car is stationary for an amount of time (fig.2a). Detected locations are then clustered by proximity and validated with the points of interest (POI) on the tourist map (fig.2b). Fig.3 shows our OD map where edges are weighted and filtered according to the combination of the (1) the number of employees moving from one cluster to another and (2) the number of days where the connection between two clusters exists. Maps are generated for each employment group based on the average number of employees and days that the employment type has been active. Finally, we present the average daily routine of GAStech employees as directed graphs. Each node represents a cluster and edges are used to encode movement between clusteres. Edge labels indicate the median hour of day where movement occurs. We observe that GAStech is the power centre in the average daily routine suggesting that it serves as a major hub where majority of the transitions between activities occur. All employment types apart from facilities share similar lunch hours and tend to visit common locations.

 

a) Movement Patterns by Employment Type

b) Movement Patterns by Employment Type & Day of Week

 

Fig.1 Comparison of movement patterns by employment types. The NA type comprises of GPS data from trucks that do not have driver information. a) Aggregation reveals how movement differs between employment types and show regions of the map that are more frequently traversed. b) Small multiples of the movement patterns by day of week revealing routine patterns on weekdays and show more volatile movements on weekends.

 

 

 

Fig.2. Locations are detected based on time intervals in the GPS data and validation with the provided tourist map. a) Locations on the map distilled from GPS data where a car is stationary for more than 70 seconds. The locations are then clustered in 10m radius and overlaid on the tourist map in b).

 

 

Engineering

Security

Executives

Information Technology

Facilities

Fig.3. Detected locations in (Fig.2.) are used as waypoints to generate an OD map. Maps on the left visualize the complete movement records from the GPS data while maps on the right show the routine movements between clusters.

 

 

Engineering

Security

Executives

Information Technology

Facilities

Fig.4. The daily routine of GAStech employees presented as a directed graph. Each node represents a location and implies an activity undertaken in that locale. Edges are used to encode the transition between locations. Edge labels indicate the median hour of day where the transition occurs.

 

 

 

MC2.2Identify up to twelve unusual events or patterns that you see in the data. If you identify more than twelve patterns during your analysis, focus your answer on the patterns you consider to be most important for further investigation to help find the missing staff members. For each pattern or event you identify, describe

a.       What is the pattern or event you observe?

b.      Who is involved?

c.       What locations are involved?

d.      When does the pattern or event take place?

e.      Why is this pattern or event significant?

f.        What is your level of confidence about this pattern or event?  Why?

 

Please limit your answer to no more than twelve images and 1500 words.

 

 

Possible Credit Card Fraud

A comparison between transaction values occurring in the credit card and loyalty card data reveals a high degree of mismatches. Fig.5a. shows the transaction mismatches that GAStech employees encounter. The bottom five individuals in the plot case do not posses transaction data from either or both credit card and loyalty card datasets. Fig.5b. compares the number of mismatches to legitimate transactions occurring at each location. We observe that 26 of 34 locations are subjected to mismatching transactions and the top five locations hosting the highest number of mismatches are (1) Hippokampos, (2) Brew’ve Been Served, (3) Guy’s Gyros, (4) Abila Zachoros and (5) Hallowed Grounds. While the distribution of mismatches are proportional to the total number of transactions in each location, Katarina’s Cafe remains free of mismatches despite hosting the largest number of transactions. While these locations are routinely visited by GAStech employees, (fig.4), the lack of a distinct pattern seem to suggest that there isn’t a targeted attack on a specific group of individuals. Temporal analysis of the data (fig.6a) shows that the mismatches peak four times a day and is most severe at exactly 12pm. Fig.6b. indicates that the majority of the mismatches occur at either $20, $40 $60 or $80. The combination of discrete mismatching values and time of day appears to be unusual, suggesting that a systematic process is involved. While it appears to be unusual and hint at a possible case of credit card fraud, this discovery may not be directly related to the missing GAStech employees.

 

Mismatch Between Credit Card and Loyalty Card Transaction

a) Number of Mismatches
by Nationality

b) Number of Mismatches
by Location

Fig.5. Bar plots illustrating the number of transaction mismatches between credit card and loyalty card by the (a) nationality of employees and (b) location.

 

Mismatch Between Credit Card and Loyalty Card Transaction

a) Number of Mismatches by Value

b) Number of Mismatches by Time of Day

Fig.6. Comparison between the number of transaction mismatches to: (a) the value of each mismatch and (b) the time where mismatches occur.

 

 

Unique Gatherings

Gatherings may be inferred when employees are spatially and temporally collocated in the same cluster. The accuracy of our inference model is determined by clustering proximity described in MC2.1. The flow diagram in fig.3 illustrates gatherings that we detected. Each spline in the diagram represents an employee and is color-coded to reflect employment type. Each box represents a gathering and is number to correspond with a cluster on the tourist map in fig.2b. The width of each box is encodes the duration of a gathering while height is used to represent the number of people involved. We detect two unique gatherings that involve only Tethyians lasting for more than 3 hours. These events do not have any instance of recurrence in the data. The highlighted box in fig.3a. indicates a large gathering at Canero street involving 14 GAStech employees from Engineering and IT. It is likely to have taken place at Felix Balas' residence. The highlighted box in Fig.3b. shows the executive team gathering at Desafio Golf Course on a Sunday. The event involves GAStech CEO Sten Sanjorge Jr. who would have been a likely target for the Protectors of Kronos. Law enforcement should investigate if any of the participants in notice any suspicious individuals or activities in the vicinity.

 

Detecting Unique Gatherings Based on Spatial Temporal Colocation

a) Large Gathering at Carnero Street

b) Executives Gathering at Desafio Golf Course

Fig.7. Flow diagram illustrating two instances where unique gatherings occurred. Each spline in the diagram represents an employee and is color-coded to reflect employment type. Each box represents a gathering and is number to correspond with a cluster on the tourist map in fig.2b. The width of each box is encodes the duration of a gathering while height is used to represent the number of people involved.

 

 

Suspicious Behavior by Kronesian GAStech Security Employees

We detect two clusters visited only by Kronesian employees from security with military experience on four occasions. Loreto Bodrogi, Minke Mies, Inga Ferro and Hennie Osvaldo are the GAStech employees who visit these clusters between 11am to 1pm. Visits to this cluster in the weekend suggest that this activity is not related to work. Connectivity analysis of the road network indicates that both clusters are situated in less accessible localities in the industrial region of the map (fig.7a). Well-connected zones tend to have more traffic while less connected regions tend to be quieter. The lack of any transaction data and POI reference on the tourist map stimulates further suspicion.

 

Detecting Suspicious Locations and Employees

a) Locations of Suspicious Clusters

b) Suspicious Employees That Visited Both Clusters

Fig.8. Discovery of two suspicious clusters (a) and the Kronesian GAStech security employees (b) who visited both locations on four occasions. Saturation is employed to encode connectedness in (a).

 

 

Unusual Amount of Truck Movement

We discover an unusually high amount of truck movement on 16/01/14. Fig.9. shows a sharp increase in the amount of waypoints generated by the onboard GPS device of the trucks. These waypoints were also generated much later than the daily average. The lack of anomalies in the delivery routes suggest that the drivers were either driving slower and/or making more trips than their daily average.

 

Fig.9. Histogram reveals an unusually high amount of truck movements after 17:00 on 16/01/14.

 

 

Large transaction at Frydo’s Autosupply

An analysis of the transactions per employee reveals that Lucas Alcazar spent a large sum of money at Frydo's autosupply on the 13/01/14. Fig.10 shows a comparison of the transaction value that take place at various locations. Larger values on the right of the plot belong to employees who operate trucks that shuttle between GAStech and various industrial locations. The locations where these transactions occur suggest work related activities. The highlight transaction indicates a distinct outlier. Though there are no records of what the transaction entails, the sum suggest that Alcazar might have purchased a vehicle. A key question raised by this conjuncture points to why he did so when he already has a car assigned?

 

Fig.10. Boxplot comparing the transaction values that occur at various locations. The highlighted transaction is a distinct outlier that should be investigated.

 

 

Unusual Transactions at Kronosmart

An unusual transaction by Lucas Alcazhar and Ada Campo-Corrente at Kronosmart on 19/01/14 was discovered to take place without any supporting trace of vehicle movement. The transaction occurring at 4am on a Sunday morning suggests that both employees could have arranged to visit Kronosmart together and made their way there without using their company provided vehicle. The lack of GPS data further suggests that the visit could have been a deliberate attempt to be discrete. It is possible that they used the new car Lucas purchased on the 13/01/14.

 

Fig.11. Timeline visualization shows employee activity over the course of one day. A thin horizontal grey line indicates that an employee is stationary while a thick horizontal grey line indicates that a transaction was made within the cluster where he/she was stationary. A thin vertical line in magenta indicates the transaction time. The highlighted region of the timeline illustrates examples where a transaction was made without supporting GPS information.

 

 

MC2.3Like most datasets, the data you were provided is imperfect, with possible issues such as missing data, conflicting data, data of varying resolutions, outliers, or other kinds of confusing data.  Considering MC2 data is primarily spatiotemporal, describe how you identified and addressed the uncertainties and conflicts inherent in this data to reach your conclusions in questions MC2.1 and MC2.2.  Please limit your response to no more than five images and 300 words.

 

Our analysis assumes that movement across the map involves a car and that employees only commute with the company supplied vehicle. We’ve learnt that this is not always true as employees may choose other forms of transport or simply walk as Abila is approximately 9.68km x 5.37km. The second assumption concerns location detection (fig 2). We assumed that either credit card transaction or loyalty card records can account for the activities taking place within a given cluster but employees do not necessarily have to make a transaction at every venue she visits (e.g. online purchases) thus not all activities can be accounted for. Finally, we employ an automatic process to assign the employees from facilities to their respective trucks. The algorithm assumes that each employee will be assigned to a truck for a day and does not consider the possibility of switching trucks in the middle of the day.